其他
解锁R语言数据魔法:如何巧妙拆分数据框,轻松搞定复杂数据分析!
解锁R语言数据魔法:如何巧妙拆分数据框,轻松搞定复杂数据分析!
介绍
本文介绍了数据框中变量的拆分,以及按照分组变量进行计算操作。
数据框拆分
使用tidyr包的separate()函数将一个包含在单个列中的信息拆分为多个新列。
separate() usage
## separate usage
separate(
data, # 要处理的数据框或数据集
col, # 要拆分的列名
into, # 拆分后新生成的列名
sep = "[^[:alnum:]]+", # 用于指定拆分的分隔符,可以是一个正则表达式. 此处“[^[:alnum:]]+"表示以非字母或数字为分隔符进行拆分
remove = TRUE, # 逻辑值,是否删除原始列
convert = FALSE, # 逻辑值,是否将结果转化为数值型\逻辑型等
extra = "warn", # 控制额外情况的处理方式,可选项有"warn"(默认,如果有额外元素,则发出警报),"drop"(删除额外元素),"merge"(将额外元素合并为一个值)
fill = "warn", # 控制缺失值的处理方式,可选项有"warn"(默认,如有缺失值,则发出警告),"right"(从右侧开始填充缺失值),"left"(从左侧开始填充缺失值)
...
)
数据框拆分例子
## R数据处理基础-表格的拆分
## Time: 20240321
## Author: xiaohe
## Notes: zhenzhen
## 数据构建
data<-data.frame(x=1:6,
y=c("A,B|C","D/E","F+G","H(I-J","K L[M","N*O@P"))
## 数据拆分
data_separate_right<-tidyr::separate(data,y,into=paste0("y_",1:3),extra="merge",fill="right")
数据框按行列计算
按行计算
## R数据处理基础-数据框按行列计算
## Time: 20240321
## Author: xiaohe
## Notes: zhenzhen
## 数据构建
data<-data.frame(x=c("a","b","c","d"),
y=c(1,3,7,9),
z=c(2,4,6,8))
## 按行求和(其中的1代表按行计算)
data_row_sum<-apply(data[,2:3],1,sum)
## 打印数据
print(data_row_sum)
按列计算
## 数据构建
data<-data.frame(x=c("a","b","c","d"),
y=c(1,3,7,9),
z=c(2,4,6,8))
## 按列求均值(其中的2代表按列计算)
data_col_mean<-apply(data[,2:3], 2, mean)
## 打印数据
print(data_col_mean)
数据框按照不同分组拆分计算(方法1)
按一个分组计算
## R数据处理基础-数据框按照不同分组拆分计算(方法1)
## Time: 20240321
## Author: xiaohe
## Notes: zhenzhen
## 数据构建
data<-data.frame(x=c("a","b","c","a","b","a","b","c","a","b"),
y=c("male","male","male","male","male","female","female","female","female","female"),
z=c(3,7,8,6,13,11,5,2,17,14))
# 根据一组(y为例)进行操作
data_aggregate_1<-aggregate(z~y,data,median) # 计算中位数
按两个分组计算
## 数据构建
data<-data.frame(x=c("a","b","c","a","b","a","b","c","a","b"),
y=c("male","male","male","male","male","female","female","female","female","female"),
z=c(3,7,8,6,13,11,5,2,17,14))
# 根据两组(x和y为例)进行操作
data_aggregate_2<-aggregate(z~x+y,data,sum)
数据框按照不同分组拆分计算(方法2)
①使用dplyr包的group_by()函数对数据框分组;②使用其他函数进行数据计算。
按一个分组计算
## R数据处理基础-数据框按照不同分组拆分计算(方法2)
## Time: 20240321
## Author: xiaohe
## Notes: zhenzhen
## 数据构建
data<-data.frame(x=c("a","b","c","a","b","a","b","c","a","b"),
y=c("male","male","male","male","male","female","female","female","female","female"),
z=c(3,7,8,6,13,11,5,2,17,14))
# 根据一组(y为例)进行操作
library(dplyr)
data_group_1<-data %>% # %>%为管道符
group_by(y) %>% # 依据y进行分组
summarise(median=median(z)) # 计算中位数
按两个分组计算
## 数据构建
data<-data.frame(x=c("a","b","c","a","b","a","b","c","a","b"),
y=c("male","male","male","male","male","female","female","female","female","female"),
z=c(3,7,8,6,13,11,5,2,17,14))
# 根据两组(x和y为例)进行操作
library(dplyr)
data_group_2<-data %>% # %>%为管道符
group_by(x,y) %>% # 依据x和y进行分组
summarise(sum=sum(z),.groups = "drop") # 求和计算
备注
大家有兴趣可加群,欢迎交流学习,共同进步!
若群链接失效,可在本公众号内的对话框回复关键词【R语言学习交流群】
若您觉得本文有帮助,帮忙点击点赞和在看是对我最大的鼓励和支持!